Genetic polymorphisms associated with susceptibility to COVID-19 disease and severity: A systematic review and meta-analysis

Although advanced age and presence of comorbidities significantly impact the variation observed in the clinical symptoms of COVID-19, it has been suggested that genetic variants may also be involved in the disease. Thus, the aim of this study was to perform a systematic review with meta-analysis of the literature to identify genetic polymorphisms that are likely to contribute to COVID-19 pathogenesis. Pubmed, Embase and GWAS Catalog repositories were systematically searched to retrieve articles that investigated associations between polymorphisms and COVID-19. For polymorphisms analyzed in 3 or more studies, pooled OR with 95% CI were calculated using random or fixed effect models in the Stata Software. Sixty-four eligible articles were included in this review. In total, 8 polymorphisms in 7 candidate genes and 74 alleles of the HLA loci were analyzed in 3 or more studies. The HLA-A*30 and CCR5 rs333Del alleles were associated with protection against COVID-19 infection, while the APOE rs429358C allele was associated with risk for this disease. Regarding COVID-19 severity, the HLA-A*33, ACE1 Ins, and TMPRSS2 rs12329760T alleles were associated with protection against severe forms, while the HLA-B*38, HLA-C*6, and ApoE rs429358C alleles were associated with risk for severe forms of COVID-19. In conclusion, polymorphisms in the ApoE, ACE1, TMPRSS2, CCR5, and HLA loci appear to be involved in the susceptibility to and/or severity of COVID-19.


Introduction
Coronavirus disease 2019 , caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was identified in China near the end of 2019, and progressed to a pandemic condition in March 2020, resulting in a major public health problem worldwide due to its social and economic burdens [1]. As of February 1, 2022, COVID-19 affected more than 370 million people, and caused more than 5,658,702 deaths (https://www.who.int/ publications/m/item/weekly-operational-update-on-covid-19-1-february-2022). Clinical manifestations of COVID-19 vary from an asymptomatic infection, dry cough, sore throat, fever, shortness of breath, fatigue, muscle pain, headache, loss of taste or smell, vomiting, diarrhea, to acute respiratory distress syndrome. Approximately 15% of patients develop the severe form, which can progress to pneumonia, respiratory failure, kidney injury, multiorgan dysfunction, and death [2,3]. The variation in symptoms and severity of COVID-19 is partially explained by known risk factors, including advanced age, male gender, and presence of comorbidities, such as diabetes, obesity, hypertension, and heart disease [4,5]. However, severe outcomes have also been observed in young and healthy patients, suggesting that other risk factors, such as genetic predisposition, may increase the risk to and/or severity of this disease [6][7][8].
It is well known that host genetic polymorphisms play a key role in the susceptibility or resistance to different viral infections [9,10]. Taking into account the main role of host genes in the entry and replication of SARS-CoV-2 in cells and in mounting the immune response, it seems that a combination of multiple genes might be involved in COVID-19 pathogenesis [9]. Accordingly, to date, numerous studies have been conducted on the association between genetic polymorphisms and COVID-19 [6,7,[9][10][11]. Some studies have indicated that polymorphisms in genes related to innate and adaptive immune response [toll-like receptors (TLRs), human leukocyte antigen (HLA) class I and II, and cytokines/ chemokines] and in genes involved in viral binding and entry into host cells (angiotensin converting enzyme-2 -ACE2, and transmembrane serine protease-TMPRSS) are associated with COVID-19 development and/or severity [6][7][8]12]. However, it is still unclear which and to what degree specific polymorphisms contribute to the susceptibility for this disease [6].
Thus, aiming to identify the genetic factors that may influence COVID-19 susceptibility and severity, we conducted a comprehensive and updated systematic review of the literature on the subject followed by meta-analyses of those polymorphisms analyzed in three or more studies. Even though few systematic reviews have been published regarding the association between polymorphisms in different genes and COVID-19 [6,7,10,12].

Literature search strategy and eligibility criteria
This comprehensive and updated systematic review was performed and written according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA), Metaanalysis of Observational Studies in Epidemiology (MOOSE) statements and guideline for Systematic Reviews of Genetic Association Studies [13][14][15], and it was registered at PROSPERO (http://www.crd.york.ac.uk/PROSPERO) under the CRD42021248091 number. We performed a search at PubMed and Embase repositories for all English, Portuguese, and Spanish language original articles that analyzed potential associations between genetic polymorphisms and susceptibility/severity for COVID-19, up to July, 2021. For this, the following MeSH terms were used: (SARS-CoV-2 OR COVID-19 OR severe acute respiratory syndrome OR SARS virus) AND (polymorphism, genetic OR polymorphism, single nucleotide OR polymorphism, single-stranded conformational OR polymorphism, restriction fragment length OR DNA copy number variations OR amplified fragment length polymorphism analysis OR mutation OR mutation rate OR INDEL mutation OR mutation, missense OR point mutation OR frameshift mutation OR codon, nonsense). In addition, studies of interest were also searched in the GWAS Catalog (https://www.ebi.ac.uk/gwas).
Two independent investigators (C.D and L.A.B) screened and evaluated the eligibility of each study retrieved from the online repositories by reviewing titles and abstracts. When abstracts did not provide adequate information, the full texts of the extracted articles were also reviewed, as previously reported by our group [16,17]. Discrepancies between the two investigators were settled by debate between them and, when necessary, a third reviewer (D.C.) was consulted. All observational human studies that compared frequencies of at least one polymorphism between patients with and without COVID-19 or between COVID-19 patients with different degrees of severity were included in this systematic review. Moreover, reference lists coming from the articles fulfilling our eligibility criteria were manually searched to identify other potentially relevant citations.
The exclusion criteria were: 1) articles without enough data to estimate an OR with 95% CI; 2) duplicated studies (in this case, the most complete study was chosen for inclusion); and 3) non-human studies.

Data extraction and quality evaluation
Necessary information from each study was individually extracted by C.D. and L.A.B. using a standardized form [16,17]. Agreement was pursued in all evaluated items of this form; however, when an agreement could not be reached, divergences in data extraction were solved by referring to the original article or by consulting another investigator (D.C.). Data retrieved from each study were as follows: 1) characteristics of the studies and samples (including publication year, name of first author, number of subjects in each analyzed group, mean age, gender, country, and ethnicity); and 2) data of the polymorphisms of interest [including their identification, allele/genotype frequencies, and OR (95% CI)]. When data were not available in the article, the authors were contacted by email for the necessary information, but only part of them answered.
The Clark-Baudouin Score (CBS) was used to evaluate the quality of the included studies [18]. This score applies pre-defined criteria to assess each publication, highlighting quality issues in the conduction of studies and interpretation of results. Using a 10-point scoring sheet, investigators can evaluate sections of the articles related to reproducibility, selection of subjects, statistical analyses, and genotyping methods.

Statistical analyses for meta-analysis
Those polymorphisms analyzed in three or more studies were submitted to meta-analyses using the Stata 15.0 software (StataCorp, College Station, TX, USA). Goodness-of-fitness χ 2 tests were used to evaluate whether genotype frequencies were in conformity with the Hardy-Weinberg Equilibrium (HWE) in the control groups. Associations between individual polymorphisms and COVID-19 susceptibility and/or severity were analyzed using OR (95% CI) calculations for the allele contrast, dominant, recessive, and additive inheritance models, categorized as suggested by a previous publication [19]. For the HLA allelic analysis, frequency was calculated as the number of cases or controls harbouring at least one positive event (one allele type) divided by the total number of chromosomes included in each of the corresponding groups [20]. Inter-studies heterogeneity was tested using χ 2 -based Cochran's Q statistic, while inconsistency was quantified with the I 2 metric [21,22]. When P < 0.10 (Q statistic) and/or I 2 > 50%, heterogeneity was considered statistically relevant. In this case, the DerSimonian and Laird random effect model (REM) was used to calculate OR (95% CI) for each study and for the pooled effect. In the lack of significant inter-studies heterogeneity, the fixed effect model (FEM) was used for this calculation.

Literature search
Fig 1 shows the flow diagram illustrating the strategy used to identify and select studies for inclusion in our systematic review and meta-analyses. A total of 2936 articles were retrieved after searching PubMed, Embase, and GWAS Catalog resources, and 2727 of them were excluded during the review of titles and abstracts due to disagreements with our defined eligibility criteria. Two hundred and nine articles remained to be full text evaluation. Nevertheless, after carefully analyzing the full texts, another 145 studies were excluded, and a total of 64 articles were included in this systematic review ( Table 1 and Fig 1). Among them, 30 studies, where the same SNP was evaluated in at least 3 articles and frequency data was available, were included in the meta-analyses.
Qualitative synthesis of studies that analyzed associations of SNPs and COVID-19 Table 1 shows the compiled main data of the 64 eligible studies included in this systematic review. More than 200 polymorphisms and 50 genes/loci were studied regarding their  British 688 cases a S1R Severity: The S1R rs17775810 T/T genotype was associated with the lowest death rate (0%, P = 0.020). associations with COVID-19 susceptibility or severity of this disease. Most of the studies compared polymorphism frequencies in patients who tested positive for COVID-19 compared to negative controls. Twenty-three studies evaluated polymorphisms in COVID-19 patients categorized according to different degrees of disease severity. S1 Table shows the quality of all studies included in this systematic review, which was evaluated using the CBS as described in the Methods Section. Considering a score system that ranges from 0 to 10 points according to the adherence to pre-defined criteria, none of the studies reached 9 points. However, the majority of the studies (70.1%) were classified as presenting good quality since they were awarded 6 to 8 points. The remaining articles were awarded with less than 6 points. More information regarding the COVID-19 diagnostic criteria, definition of severity degrees, age, ethnicity, gender, and genotyping techniques are described in S2 Table. The most studied candidate genes/loci were: HLA, ABO, ACE1, ACE2, APOE, CCR5, TMPRSS2, and IFITM3. In total, 8 polymorphisms in 7 candidate genes and 74 alleles of the HLA loci (A, B, C, DRB1, DQA1, and DQB1) were analyzed in �3 studies and subsequently included in the meta-analyses.

Meta-analyses of HLA alleles
The A, B, C, DRB1, DQB1, and DQA1 alleles of the HLA were analyzed according to the risk of COVID-19 (S3 Table) or the severity of the disease (S4 Table).  3C and 3D). Our meta-analyses demonstrated that the other 70 alleles of the A, B, C, DRB1, DQB1, and DQA1 loci were not associated with COVID-19 development or severity (S3 and S4 Tables).

Meta-analyses of CCR5 and IFITM3 polymorphisms
Three studies were included in the meta-analyses of CCR5 rs333 (Ins/Del) polymorphism regarding the risk of COVID-19 and its severity [30,36,46] ( Table 2). The Del allele was  Fig 4A) and dominant (OR = 0.82, 95% CI 0.68-0.98) models; however, this polymorphism was not associated with the severity of the disease ( Table 2).
For the IFITM3 rs12252 (T/C) polymorphism, the pooled analyses of 4 studies [24,42,72,84] indicated no association of this polymorphism and different degrees of COVID-19 severity, for all tested genetic models ( Table 2).

Discussion
Elucidating the genetic determinants of SARS-CoV-2 infection is essential for understanding the pathophysiology of COVID-19 and the inter-individual variability in its severity; thus, contributing to the development of updated vaccines and new antivirals. Hence, in this systematic review, we summarized the results of 64 eligible articles that analyzed the association between genetic polymorphisms and risk for infection or severity of COVID-19. Moreover, data regarding polymorphisms in 8 genes (HLA, ABO, ACE1, ACE2, APOE, CCR5, TMPRSS2, and IFITM3) were meta-analyzed in relation to the risk of infection and severity of COVID-19. Pooled results demonstrated that polymorphisms in the ApoE, ACE1, TMPRSS2, CCR5, and HLA genes appear to be involved in the susceptibility to and/or severity of COVID-19.
Angiotensin-converting enzyme 2 (ACE2) and type II transmembrane serine protease (TMPRSS2) are candidate genes for susceptibility for SARS-CoV-2 infection since SARS-CoV-2 uses the ACE2 receptor for cell entry, while the serine protease TMPRSS2 is required for priming of the viral spike (S) protein [86,87]. ACE2 and ACE1, together with renin and angiotensin, constitute the renin angiotensin aldosterone system (RAAS), which is a complex system involved in multiple biological process that regulated blood pressure homeostasis and extracellular volume, and inflammation, which is closely related to COVID-19 morbidity and mortality, as it affects bradykinin production [88,89]. Following the viral entry, ACE2 is downregulated, causing an ACE1/ACE2 imbalance and contributing to RAAS overactivation and pulmonary shutdown. The consequent increased ACE1 activity and reduced ACE2 expression increase the risk of pulmonary diseases by increasing the lung vascular permeability; thus, leading to lung damage [90][91][92]. Accordingly, studies have reported the association between polymorphisms in ACE1, ACE2, and TMPRSS2 genes and SARS-CoV-2 infection [28,32,33,41,44,48,52,58,60,61,63,68,73,77,83]; however, the results are still contradictory. In the present meta-analysis, two ACE2 polymorphisms (rs2285666 and rs41303171) were analyzed, but no association with COVID-19 was found. Nevertheless, we demonstrated an association between the T allele of the TMPRSS2 rs12329760 polymorphism and protection against the most severe form of COVID-19.
Regarding the ACE1 gene, the insertion/deletion (Ins/Del) of 287-bp in the Alu-sequence of intron 16, represented by four individual SNPs (rs4646994, rs1799752, rs4340 and rs13447447), modulates ACE1 expression [93][94][95]. This Ins/Del variant results in alternative splicing, leading to protein shortening and loss of the catalytically active domain in ACE1 Ins allele carriers [92]. Moreover, the ACE1 Ins/Del variant explains about 60% of variability in ACE1 levels in the general population since ACE1 levels in Ins/Ins carriers are approximately half of that of Del/Del carriers [39, 93,96]. In the context of SARS-CoV-2 infection, studies have reported variations in COVID-19 recovery and prevalence rates are associated to ACE1 Ins/Del frequency and geographical variations of this variant [97,98]. Here, we showed an association between the ACE1 Ins allele and protection against severe COVID-19.
Major histocompatibility complex genes (MHC, known as Human Leukocyte Antigens, HLA) play a critical role in immune response [99]. The HLA system is a remarkably polymorphic region and genetic variants of HLA have been reported to affect the clinical course of patients infected with different viruses [100], including SARS-CoV-1 [101]. A specific set of HLA will present the peptides of the degraded virus to receptors on T cells, thus eliciting an immune response for virus eradication [102]. The set of HLA alleles inherited by an individual will determine the immune responses to viruses according to the selected peptides that can bind to the peptide-binding groove [102]. Studies in different populations have shown associations between some HLA class I (A, B, and C) and class II (DRB1, DQA1, and DQB1) alleles and COVID-19 susceptibility and/or severity [82,103]. Our meta-analyses did not confirm the results of previous individual studies; however, we identified new HLA alleles associated with COVID-19: the HLA-A � 30 and HLA-A � 33 were associated with protection against COVID-19 infection and the most severe form of this disease, respectively. Besides, the HLA-B � 38 and HLA-C � 06 alleles were associated with risk for severe COVID-19.
The interferon-induced transmembrane 3 (IFITM3) is an IFN-stimulated gene (ISG) essentially expressed on endosomes and lysosomes [104]. IFITM3 is part of an ISG family (IFITM) responsible for inhibiting the fusion between viral and cellular membranes of many viruses, such as influenza A H1N1 virus, dengue virus, and SARS-CoV [104]. On the other hand, it was recently shown that IFITM proteins are cofactors for efficient SARS-CoV-2 infection in human cells [105], reaffirming a key role of this gene in the susceptibility to COVID-19. Nevertheless, here, the IFITM3 rs12252 polymorphism was not associated with COVID-19 severity. Of note, we did not analyze this polymorphism regarding COVID-19 infection susceptibility due to lack of studies. Although this SNP in IFITM3 gene was not associated with COVID-19, it is noteworthy that type I IFN (IFN-I)-stimulated immunity has been shown to influence COVID-19 severity. Inborn errors of IFN-I pathway and pre-existing autoantibodies neutralizing IFN-I appear to be strong determinants of critical COVID-19 pneumonia in about 15-20% of patients [106]. Asano et al., [107] reported that deleterious X-linked TLR7 mutations were observed in 16 male subjects from a cohort of 1202 patients with unexplained critical COVID-19 pneumonia. The patients' blood plasmacytoid dendritic cells (pDCs) produced low levels of IFN-I in response to SARS-CoV-2. Human TLR7 and pDCs are essential for protective IFN-I immunity against SARS-CoV-2 in the respiratory tract. Moreover, Zhang et al., [108] showed that inborn errors of TLR3-and IRF-7 dependent IFN-I immunity can cause life-threatening COVID-19 pneumonia in patients with no prior severe infection.
Chemokines act attempting to maintain the immune homeostasis and to defend the body against harmful stimuli, such as SARS-CoV-2 infection [109]. CCR5 encodes a chemokine receptor expressed in macrophages and T cells, and its upregulation has been confirmed in COVID-19 patients [110]. Furthermore, an anti-CCR5 treatment has been shown to relieve the symptoms and the cytokine storm in COVID-19 patients who are critically ill [109]. The CCR5 gene is located at 3p21.31, a gene cluster region associated with severe COVID-19 courses [39]. The most studied CCR5 polymorphism regarding COVID-19 susceptibility is the Δ32 Ins/Del (rs333) [30,34,36,46]. The CCR5 rs333 Del allele results in loss of function of the protein; being a major determinant of the resistance to HIV infection since the CCR5 protein serves as one of the gateways for the HIV virus [111]. Accordingly, our meta-analysis showed the CCR5 rs333 Del allele was associated with protection against COVID-19 infection [34,36,46].
A Genome-Wide Association Study (GWAS) carried out by the Severe COVID-19 GWAS Group [39] reported that one of the 2 strongest signals associated with severe COVID-19 was located within the ABO blood-group system. The involvement of ABO blood groups in COVID-19 susceptibility has been reported in both genetic and non-genetic studies. The blood group O was previously associated with a lower risk of acquiring COVID-19 when compared to subjects with non-O blood groups, whereas the blood A group was associated with a higher risk for this disease than non-A blood groups [39]. One of the assumptions is that the A-antigen causes P-selectin and intercellular cell adhesion molecule 1 binding to endothelial cells, increasing the probability of cardiovascular disease. Another explanation is that individuals with blood group O have decreased levels of von Willebrand factor, lowering the thrombotic disease risk [reviewed in [103]]. The rs8176719 polymorphism is the main determinant of the O blood group and has been investigated as a potential marker of COVID-19 susceptibility. However, some studies did not confirm these findings [35,38]. In our meta-analysis, we demonstrated that the ABO rs8176719 -/C SNP was not associated with COVID-19 infection neither with different stages of severity.
The ApoE ε4 genotype was investigated in the UK Biobank Cohort, being associated with COVID-19 severity and mortality [51]. This finding was replicated in other studies [37,46]. Apolipoprotein E (ApoE) is broadly expressed in human tissues and has an essential role in lipid transport, which has a key role in many functions, including immunity [112]. The most studied polymorphisms in ApoE are the rs429358 (ApoE4, C/T) and rs7412 (ApoE2, C/T), both located at exon 4. Three haplotypes are generated from these two polymorphisms (ε2, ε3 and ε4), codifying 3 protein isoforms (E2, E3 and E4). Moreover, these haplotypes can combine in 6 different variants: ε2/ε2, ε2/ε3, ε2/ε4, ε3/ε3, ε3/ε4, and ε4/ε4 [112]. Among them, the ancestral ApoE ε4/ε4, generally considered deleterious, is a significant risk factor for Alzheimer's disease and other human pathologies, including type 2 diabetes and cardiovascular disease, which are known risk factors for worst outcomes of COVID-19 [112][113][114]. In the present meta-analysis, the pooled data of three studies confirmed the association of the ε4 allele with both risk to COVID-19 presence and severe outcomes of the disease. It has been hypothesized that elevated cholesterol and oxidized lipoprotein levels, linked to the effects of ApoE ε4/ ε4 variant, is associated with increased pneumocyte susceptibility to infection and to exaggerated lung inflammation [112]. Moreover, the frequency of the ε4 allele is higher in African-Americans who had increased mortality due to COVID-19 compared to Caucasian populations [115].
The results of the present meta-analysis should be interpreted within the context of a few limitations. Inter-studies heterogeneity is common in meta-analyses of genetic association studies and it should be cautiously interpreted. Some included studies did not test the control groups for COVID-19 or included controls derived from previous databank or ecological studies without COVID-19 information. Moreover, the COVID-19 severity criteria varied among the studies. Particular studies had included asymptomatic patients while others only included patients with at least a given symptom. Due to the presence of more than 2 groups of COVID-19 severity stages (mild, moderate and severe), we have categorized the patients regarding COVID-19 severity in different ways; however, it was more rational to show the data categorizing the most severe group against the others groups (asymptomatic and/or mild plus moderate). It was not possible to evaluate the association with mortality, as only few studies presented data comparing COVID-19 survivors and non-survivors. Furthermore, the impact of gender and age, which may influence the COVID-19 predisposition, could not be assessed due to the small number of studies for each SNP. Genetic background among different populations may significantly influence COVID-19 susceptibility, and the studies included in the present meta-analysis comprised different ethnicities. However, due to the small number of studies for each ethnicity, we were not able to analyze the impact of genetic background on the results. Finally, we cannot be sure that small negative studies were overlooked since we could not perform the publication bias analysis due to the small amount of studies for each SNP.
The infection with SARS-CoV-2 and its clinical course are dependent on the complex relationship between the virus and the host immune system. In this meta-analysis, we identified, for the first time, that four alleles of the HLA class I loci (A � 30, A � 33, B � 38 and C � 06) are associated with COVID-19. Moreover, we confirmed the association between COVID-19 susceptibility and polymorphisms in the ApoE, ACE1, TMPRSS2, and CCR5 genes. These findings will guide further epidemiological studies on host genetics as well as the development of innovative treatments. Considering that specific genetic polymorphisms might lead to severe COVID-19 outcomes, it is of extreme importance to use individual genetic data to employ personalized therapeutics and improve the COVID-19 prognostic.
Supporting information S1 Table. Clark-Baudouin quality assessment scale for the studies included in the systematic-review.